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Abstract: Quite generally, constraint-based metabolic flux analysis describes the space of 
viable flux configurations for a metabolic network as a high-dimensional polytope defined 
by the linear constraints that enforce the balancing of production and consumption fluxes for 
each chemical species in the system. In some cases, the complexity of the solution space can 
be reduced by performing an additional optimization, while in other cases, knowing the range 
of variability of fluxes over the polytope provides a sufficient characterization of the allowed 
configurations. There are cases, however, in which the thorough information encoded in 
the individual distributions of viable fluxes over the polytope is required. Obtaining such 
distributions is known to be a highly challenging computational task when the dimensionality 
of the polytope is sufficiently large, and the problem of developing cost-effective ad hoc 
algorithms has recently seen a major surge of interest. Here, we propose a method that 



Metabolites 2013, 3 



839 



allows us to perform the required computation heuristically in a time scaling linearly with 
the number of reactions in the network, overcoming some limitations of similar techniques 
employed in recent years. As a case study, we apply it to the analysis of the human red 
blood cell metabolic network, whose solution space can be sampled by different exact 
techniques, like Hit-and-Run Monte Carlo (scaling roughly like the third power of the system 
size). Remarkably accurate estimates for the true distributions of viable reaction fluxes 
are obtained, suggesting that, although further improvements are desirable, our method 
enhances our ability to analyze the space of allowed configurations for large biochemical 
reaction networks. 

Keywords: metabolic networks; flux balance analysis; belief propagation algorithm 



1. Introduction 

The development of high throughput techniques now makes available a considerable number of high 
quality reconstructions of the metabolism of a variety of organisms, which include the stoichiometry of 
the biochemical reactions in the network and the underlying enzyme-gene associations [1-4]. Ideally, 
this information may be employed for kinetic modeling approaches that could shed light on issues, 
like the organization of a cell's metabolic phenotype and its robustness to perturbations (internal or 
exogenous) in a fully dynamical setting. Indeed, consider a metabolic network of iV coupled chemical 
reactions transforming M metabolites and let £ = {£f } (i = 1, . . . , N; fj, = 1, . . . , M) denote the 
stoichiometric coefficients, with the standard sign convention to distinguish substrates (£f < 0) from 
products (£f > 0) in each reaction, i. If one denotes by 7^ the rate of change of the intracellular level 
of species \i, due to exchanges between the cell's interior and the environment, then, under mass action 
kinetics, the intracellular concentration, c M , of metabolite [i obeys the equation: 



where 7 M > 0 (resp.7 M < 0) if there is a net out-take (resp. in-take) of species /i. 

Unluckily, addressing the above system in full generality requires knowledge about reaction 
mechanisms and kinetic constants (which specify how rates depend on concentrations), which is at 
best only partially available. (Besides, it is not entirely clear to us that, were that information fully 
at our disposal, simulating (1) for a genome-scale reconstruction involving thousands of reactions and 
metabolites would be a sensible thing to do). 

Computational studies of metabolic networks therefore generally assume that the cell operates at 
non-equilibrium steady-state (NESS) conditions, where the concentration of the metabolites is 
constant [5,6], with the rationale that as long as one is interested in the behaviour for time scales shorter 
than genetic ones (minutes), the much faster equilibrating biochemistry can be assumed to be "frozen" 
in an NESS. Under this assumption, computing NESS fluxes more modestly requires the prescription of 




(1) 



i=i 
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bounds for flux variability, i.e., x % E [m\ M % ] (which also encode for reaction reversibility assumptions), 
and for exchange rates, i.e., 7 M G [m M , M M ], with which conditions (1) can be written as: 

N 

m"<^£V<ikP, M = 1,...,M (2) 

i=i 

m ! <i'<M i , z = l,...,iV (3) 

This set of inequalities is fairly general, since it may include metabolites involved only in internal 
reactions (for which 7^ = 0), as well as exchanged species (i.e., with 7^ 7^ 0). The solution space 
of Equation (2) is, in turn, given by: 

S = fx e R N s.t. m 11 ^i xi < MM (/W = 1, • • • , M) and m 1 < x l < M { (i = 1, . . . , N)\ (4) 

The problem we address here concerns computational methods to sample S, in order to retrieve 
information on quantities like the distribution of the allowed values of each flux over the entire solution 
space. While most studies aiming at developing predictive power on metabolic phenotypes (especially 
for microbes) have been thus far based on coupling (2) with a phenotypic optimization principle (the 
solving of which requires no sampling of S), it is being increasingly recognized that the information 
entailed by the structure of S might provide key insight into different aspects of metabolic network 
analysis, from flux-flux correlations, to robustness, to perturbations, to (more subtly) optimal experiment 
design [7]. The task, however, presents many challenges, especially from the viewpoint of CPU costs. 
The running time of the paradigmatic algorithm to sample high-dimensional polytopes, such as S, 
namely Hit-and-Run Monte Carlo, is known to scale, in the best case, as the third power of the number 
of variables in the system [8], making it impractical for large enough networks. On the other hand, it 
has recently been shown that the heuristics of message-passing (MP) algorithms may provide a powerful 
alternative [9]. In brief, MP-based protocols are designed to compute solutions to statistical inference 
problems, like estimating marginals of random variables, exploiting peculiar topological properties of the 
underlying graphs that describe the interdependence of variables (reactions in our case) and constraints 
(metabolites in our case). Indeed, when such a graph is locally tree-like, i.e., it lacks short loops, 
statistical inference can be performed accurately by MP in linear time with the number of variables [10]. 
This property has been used in [9] to devise a highly efficient MP method to sample sets like S. Yet 
more recently [11], a second MP algorithm has been proposed to overcome some of the limitations of 
the method of [9], most notably, the inapplicability of the latter to real valued stoic hiometry and, perhaps 
more importantly, the accuracy problems that may arise when (some) fluxes are allowed to span several 
orders of magnitude. 

Here, we build on the work presented in [11] and apply a MP methodology, which we call weighted 
Belief Propagation (wBP), to the analysis of the solution space of metabolic reaction networks. In 
particular, we focus on the metabolism of the human red blood cell (hRBC), a major benchmark for 
sampling tools, as its size (N = 46, M = 34) makes it possible to characterize its solution space 
S by various methods and to compare the results. Specifically, we have evaluated results retrieved 
by wBP against those obtained by Hit-and-Run Monte Carlo to sample the polytope S for the hRBC 
(A note of caution here is needed regarding the words Monte Carlo, which are used throughout the 
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text, as well as in some of the references in a somewhat unspecific way. As is well-known, Monte 
Carlo techniques generically rely on repeated random sampling. Thus, Hit-and-Run is a Monte Carlo 
algorithm. The method used in [12] to sample a volume is also a Monte Carlo algorithm, albeit a different 
one, based on a rejection method. Methods to perform statistical inference by directly computing 
high-dimensional integrals, or sums, are often also Monte Carlo methods, as long as the integrals 
involved are evaluated by Monte Carlo integration. Therefore, by itself, Monte Carlo is perhaps too 
generic a term and does not reveal the specifics of an algorithm. The reader is advised to check the 
references carefully when comparing the various techniques). Because of the importance of Hit-and-Run 
as a mathematically controlled procedure with broad applicability, we have tried to devise a Monte 
Carlo method with computing times reduced as much as possible (the limit of standard Hit-and-Run 
techniques lying in their mixing time, known to scale cubically with the number of variables). The 
modified Hit-and-Run algorithm we have devised, based on a projection method, appears to be indeed 
optimized from this viewpoint with respect to, e.g., what was done in [13]. We call the resulting method 
the Kernel Hit-and-Run algorithm (KHR). As we will see, results obtained by wBP and KHR are in 
remarkable agreement among themselves (and with previous studies of the same problem by other, more 
costly, computational methods). 



2. Methodology 



2.1. Mathematical Statement of the Problem 



Suppose that we are interested in estimating the probability distribution functions (PDFs) Pi(x) for 
each reaction flux % — 1, . . . , N. By definition, they are given by: 



Pi(x) 



Vol(gj(g)) 
Vol(S) 



Si(x) = {x e S s.t. x l = x] 



(5) 



where Vol(S') is the volume of set S. Pi(x) can be mathematically written as an integral over all fluxes, 
but Xi, of a set of functions enforcing the constraints that define the poly tope, namely (1). Denoting by 
these functions (fi = 1, . . . , M), we can write Pi(x) for flux i as: 



P(x) 




where xu 



D 



X 



\i 
N 

i&i) 



[X 



X 



(6) 



denotes the collective integration variable, 



[m e ,M £ ] is the domain of integration (the set-product of the ranges of variability of 
all fluxes, except flux i) and = J dxPi(x) is a normalisation constant, so that each Pi(x) is properly 
normalised to one. Each indicator function F^ should distinguish between metabolites involved only 
in internal reactions (/i £ X for brevity) and metabolites that are exchanged with the surrounding. A 
convenient parameterisation is given by: 



(7) 
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where 8(y) is a Dirac ^-distribution and p M is an a priori distribution for the exchange rate 7 M . Under this 
choice, for intracellular metabolites, simply enforces the mass-balance constraint (1) with 7^ = 0 
in Equation (6) through a 5-function. For exchanged chemical species, the situation is slightly more 
complex. If the exchange rate takes a fixed value z 0 , then p^(z) = 5{z — z 0 ) and: 

F^(y) = 5(y-z 0 ) (8) 

corresponding, once inserted in Equation (6), to the mass-balance constraint (1) with 7^ = z 0 . If, 
however, the exchanged rate is known only probabilistically, then p^(z) can be a non-trivial distribution 
and F M enforces (1) in (6) by weighing all possible values of 7 M according to the measure p M . For 
instance, if there is no a priori information about the exchange rate, then p^iz) can be taken to be 
uniform. Note that when 7^ is a random variable, one can consider it as another unknown rate, so that 
one could also be interested in estimating its a posteriori distribution ^(7). The problem we want to 
face is that of computing quantities like Equation (6) for all i's. 

2.2. Weighted Belief Propagation 

To push mathematically forward expression (6), we need to do some type of approximation for S. 
Following [11], we assume the bipartite graph that describes the interdependency of reactions and 
metabolites to be locally tree-like. In such a case, we are supposing there are no (or only very long) 
cycles connecting the reactions that process a given metabolite p. (In the following, we shall write 
i E p to indicate that reaction i processes, either as a substrate or as a product, metabolite p.) Thus, if we 
imagine removing metabolite p from the system, all reactions i E p become (approximately) statistically 
independent, as they belong to separate branches of the metabolic (tree-like) network, and their joint PDF 
factorizes. This is explained pictorially in Figure la. If we now put metabolite p back, we see that, at 
a fixed value, x, of reaction i, the probability L^i(x) that mass balance holds for metabolite p can be 
expressed, in terms of the factorized PDF computed in absence of p, as (see Figure lb): 

v*o*o =t— [ dx »\i F » f 2 & v + & x n p ^M) (9) 

with L^i, a normalisation constant. In this formula, we use Latin labels (i, £,...) for reactions 
and Greek ones (p, is, . . .) for metabolites, while the script, i E p\i, denotes the reactions that 
process p except reaction i. Accordingly, we defined the shorthands, dx^i = YieeiAi ^ an( * 
DfAi = x^ 6At w[m^, M 1 ]. The quantity, P^ M (V), is the PDF of flux i taking a value x e , when metabolite 
p is removed. Those PDFs are, in turn, given by the probability, for each reaction i, to satisfy the mass 
balance conditions for all the metabolites they process, except p (see Figure lc), namely: 

Pi^(x) = \\ L„^i(x) (10 ) 

Here, the set v E i\p stands for all metabolites v, processed by reaction i, except p, and is a 
normalisation constant. Again, the above equations simply state the fact that, on locally tree-like graphs, 
the contributions to the PDFs coming from each node (reaction or metabolite) nicely factorize. 
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Figure 1. The method used to derive self-consistency Equations (9) and (10) for the 
conditional marginals; we only show the nearest neighbours of what one must imagine to 
be a large tree-like bipartite graph, where circles are reactions and squares, metabolites. 

(a) Metabolite /i and the reactions that process it (i E /i). If we assume removing /i from the 
system, all reactions connected to it belong to disjoint branches of the metabolic network, 
highlighted with the dashed lines. As a consequence, their joint probability distribution 
function (PDF) factorizes in the product of the marginals, P^ M (x), of each reaction i. 

(b) When metabolite /i is put back in the graph, the probability, L^i(x), of satisfying its 
mass balance condition when fixing the flux of reaction i to x depends on the marginals, 
Pi^^(x), of all neighbours, but i, and on the indicator function, F^. (c) The marginal 
Pi-+n{x), which is computed in absence of fi, expresses the probability that i satisfies the 
mass balance conditions for all the metabolites it processes (77 E i), except /1. On a tree, 
each mass balance condition is independent, so that the probability of satisfying all of them 
is given by the product of the various L v -+i{x). 



(a) (b) (c) 




The conceptual step of removing metabolites from the system is the key that allows us to recast the 
problem in the set of self-consistency Equations (9) and (10), for the conditional probabilities (the reader 
should keep in mind that this is, however, just a mathematical trick with no biological interpretation 
whatsoever [10]). Once the fixed point of the system formed by Equations (9) and (10) is known, one 
can compute the actual PDFs of the fluxes in the metabolic network as: 

P i( X ) = ^\\ L ^i( X ) (11) 

i 

where Pi is a normalisation constant. Note that Equation (11) also provides the recipe to evaluate the 
PDFs, Pn(j), for the exchange rates, 7 M , once the conditional marginals, P^^x 1 ), are known. 

As discussed in [9,1 1], the difficulties of the problem lie not so much in the derivation of Equations (9) 
and (10), but in devising an efficient method to solve them. In wBP, we tackle the issue by representing 
the marginals (10) through a collection of M variables and associated weights, rather than discretizing 
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them as one would normally do when facing a similar problem. Let us illustrate the idea with a fairly 
simple example. Consider the integral: 



0*0*0 = 77 / dy4> y (y) I dz<f) z (z)5{x + y + z - 1) (12) 

with the extra condition that x > 0, where </> y , 0 2 are known densities normalised in the interval [0, 1], 
and C is a normalisation constant. To evaluate (12), we could use Monte Carlo integration and draw Af 
pairs of random variables {(yi,Zi)}iLi according to the distributions, 4> y and <p z . Correspondingly, an 
estimate for 4> x (x) can be written as: 

, M _ E^i S(x + yi + Zi- 1)9(1 - y A - zf) 

<Px\X) - M 

Ei=i ©(i - v% - Zi) 

where the term, 0(1 — yi — zf), accounts for draws for which the quantity x = 1 — yi — Z{ must 
be rejected due to the condition x > 0. The latter condition indeed defines a feasible triangular 
region in the integration plane, yz (see Figure 2), such that every extraction (y^ zf) falling outside this 
domain must be rejected. This method (basically, naive Monte Carlo integration) is, hence, poised to 
be rather inefficient. Fortunately, we know precisely where the rejection region is, and we can rewrite 
Equation (12) as follows: 

'Pxi.x) = — J dycj) y (y) J dz(p z (z)5{x + y + z - 1) (14) 

Now, Equation (14) does not contain a rejection region, but we cannot apply Monte Carlo integration 
just yet, since (j) z (z) is not normalised in the interval [0, 1 — y\. Introducing the corresponding weight: 

w{y)= [ " dz<p z {z) (15) 



we can, however, re-cast Equation (14) in the form: 

I /•! ri-v 

<Px{x) = g J dy<f) y (y) J dz(f) z (z\y)w(y)8(x + y + z - 1) (16) 

with (f) z (z\y) = (f)(z)/w(y). The distributions appearing above are now properly normalized. Therefore, 
to evaluate Equation (16), we can simply draw pairs {(yi, zi)}^ according to (f> y (y) and <j) z (z\y), 
respectively, and estimate 4> x (x) by: 

N w( ■) 

4> x (x) = ^ aiS{x + yj + Zj - 1) , with a, = — ^ — l — — (17) 

i.e., by Af pairs of variables and weights {(x,; = 1 — yi — Zi, a i )}^ 1 . 

The key point of this method is that the reweighted density, (j> z {z\y), has a y-dependent support, such 
that rejection never occurs. Thus, at a price of computing a weight, w(y), we overcome the whole 
rejection issue, and the method becomes much more efficient. 

The great advantage of using wBP is that, at fixed Af, its running time goes as 0(2Nk), where k is 
the average number of metabolites processed by each reaction. Thus, as opposed to sampling techniques 
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that have normally super-linear mixing times [14], wBP only scales linearly with the number of reactions 
(see Figure 3), making it an ideal candidate for application to genome-scale metabolic networks. In the 
present work, we focus, however, on the relatively small case of the hRBC, so that we are able to compare 
with sampling methods that yield a uniform exploration of the solution space S [8]. Due to the nature of 
such methods (see next section), this type of comparison is still not feasible for larger systems. This, and 
the fact that previous results are available [9], make the metabolic network of the hRBC the ideal testing 
ground for wBP. 

Figure 2. Avoiding rejection. As explained in the text, from the original integration region, 
(y, z) G [0, 1] x [0, 1], only the one below the line z = 1 — y contributes to 4> x (x). However, 
in this lower triangle, the density, <p y {y)(j) z {z), is no longer normalised. This is easily dealt 
with by reweighting the integral. 




Figure 3. The running time t of the weighted Belief Propagation (wBP) algorithm vs. the 
number of reactions, N. For each value of N, we average here over 10 random synthetic 
metabolic networks, each having M = N/2 metabolites. The algorithm (blue circles) scales 
linearly with the system size; a linear function, t oc N (green dashed line), is plotted to guide 
the eye. 




1000 
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2.3. The Kernel Hit-and-Run (KHR) Algorithm 

In order to sample the solution space S and obtain exact PDFs of individual fluxes for the hRBC by a 
controlled method that guarantees uniformity, we have developed an optimized version of Hit-and-Run 
Monte Carlo, which we call the Kernel Hit-and-Run method. Let us start by re-writing constraints (2) 
explicitly for metabolites involved in internal reactions and the rest: 

^2^ = 0, neX (18) 
m"<^£V<ikP, (igl (19) 

m L <x l <M\ i = 1, . . . , N (20) 

We note that the set of \X\ equations in (18) defines the null-space of and geometrically corresponds 
to a family of hyperplanes passing through the origin x = 0. Let us denote the dimension of the null 
space of £ as K. Clearly, K would be at least iV — \X\ (actually K = N — \X\ when £ has full row 
rank, which can always be made to be the case and which we assume from now on). This means, 
obviously, that, although the number of variables in the system is N, due to the constraints in the model, 
the actual dimension of the solution space S is only K. As in real metabolic networks most reactions 
are internal, the dimension K of the null space will be significantly smaller than the original dimension 
of the problem N. Additionally, it turns out that the way to implement in practice such a dimensional 
reduction is quite straightforward: suppose that a basis of the null-space has been found, e.g., through 
Gaussian elimination or singular value decomposition (SVD), and let us denote as y = (y 1 , . . . ,y K ) 
the system of coordinates with respect to such a basis, so that we can write each flux in this basis as 
x % = J2f=i ^)U^ w i m ^ an N x K matrix related to the change of basis between the original space and 
the null subspace. Plugging this into Equations (19) and (20) allows us to write: 

K 

<J2 < Af " , /i <£ X 

"I (2D 
m * <J2®jV j < Mi , i = l,...,N 

3=1 

where we have defined the projected stoichiometric matrix, ^, with entries = Y^iLi^i®)- The 
set of Equations (21) defines a i^-dimensional polytope in the null space (see Figure 4), which can be 
sampled uniformly by using the Hit-and-Run algorithm [8,15,16]. Finally, to go back to the original 
space, that of the reaction rates, we simply use the fact that x % = J2f=i ^)V^ ■ The sampling properties 
of the Hit-and-Run algorithm under the uniform measure were indeed mathematically proven [8], and 
in our case, it is very easy to see that the uniform measure in the /^-dimensional null space is preserved 
under a linear transformation, so that the final sample in the full-dimensional space is also uniform 
by construction. 

While the sampling measure of KHR is well controlled, a word needs to be spent on the algorithmic 
mixing time. For the standard Hit-and-Run algorithm, this scales as the square of the dimensions times 
the diameter of the polytope, i.e., in practice, cubically with the number of dimensions [8,14]. Yet, as 
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mentioned before, in our approach, the dimension of the polytope is K, rather than N. This can yield a 
significant reduction in computation times if K is small compared to N, as will quite generally be the 
case. For the hRBC, for instance, we pass from iV = 46 (which can be problematic, e.g., for Monte 
Carlo rejection [17]), to a much more modest K = 12, which is sampled quite fastly by KHR. Note also 
that no additional constraints need to be introduced to enclose the polytope (as opposed to [13]). This is 
due to the fact that the M — |X| metabolites that are exchanged with the environment suffice to bound 
the polytope in the null space. 

Figure 4. A cartoonish representation of the polytope in a fC-dimensional null space spanned 
by ^/-coordinates. Here, the green dashed lines represent the set of hyperplanes (21) enclosing 
the polytope. 

i^-dimensional null space 



Vk 




Polytope 



.•\* 



y.j 



Finally, it is worth mentioning that the matrix <fr can be easily obtained with any standard algebra 
software or by standard SVD algorithms, and that no matrix inversion is required to compute the 
projected matrix ^ nor to convert the obtained sample to the original, full dimensional space. Therefore, 
to sum up, KHR uniformly samples the solution space S, with a mixing time that scales as 0(K 3 ). 

3. Results and Discussion 

We have applied the wBP and KHR algorithms to the study of the metabolic network of the hRBC. 
As mentioned in Section 2.2, such a choice is dictated by the fact that the hRBC size allows us to apply 
HR in a modest time and that previous results are available. We have used the same network considered 
in [9,12,17], which accounts for 34 metabolites, 32 internal reactions and 14 exchange reactions (see 
Figure 5). We are able to smoothly apply the method by using the effective reaction domains computed 
in [17] and used also in [9]. Such domains are derived starting from real enzymatic rates [18,19], so that 
they are physiologically meaningful, and span several orders of magnitude. Note that this would be a 
major issue if one were to discretize marginals (10), as it would require dealing with binning functions 
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defined on totally different scales. Thanks to our representation in terms of variables and weights, the 
fact that the reaction fluxes are of a different order of magnitude does not affect our method at all. 

We run the wBP algorithm by representing the marginals, like Equation (10), with sets of 
variables/weights {(x{, a i )} J ^L 1 of size M = 500. To solve the fixed point equations (9) and (10), we 
performed 30 iterations of our method. We started with uniform weights {ae} J ^ 1 and, at each iteration 
t = 1, . . . , 30, and for each fixed value of the variable x i? we applied wBP 10 3 x t times to evaluate 
the average weight aj. Once convergence was reached, we used the variable/weight sets to compute the 
final 46 PDFs, Pi(x) and ^(7), according to Equation (11). In this last step, we averaged the weight 
values over 10 5 wBP extractions to achieve a higher accuracy. We report the results in Figures 5 and 6, 
where we compare our method with KHR; the agreement is excellent. The reaction PDFs obtained with 
both methods have indeed a very similar domain and shape in most of the cases. Notably, wBP does 
not perfectly capture the profile of reactions involving currency metabolites, such as ATP, ADP, NADP 
and NADPH. An explanation of this may lie in the fact that these compounds are highly connected in 
metabolic networks and likely to be involved in small loops that are not considered by the wBP method. 

Figure 5. Results for human red blood cell. Here, we draw a pictorial representation of 
the system as a directed bipartite graph. Reaction nodes are plotted with their PDFs and 
metabolite nodes with green squares. Arrows entering (resp. leaving) a reaction stand for a 
substrate (resp. a product). We have plotted the marginals, Pi(x), for the internal reactions 
together with the P^(t) f° r me exchange rates (these are the leaves on the bipartite graph). 
For the densities, we have used the wBP method (red filled plots) and have compared them 
with the KHR algorithm (blue solid lines). 
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Figure 6. Results for human red blood cell. The probability density functions of the 
reaction rates; reaction names are the same as [9]. For the densities, we have used the 
wBP method (red filled plots) and have compared them with the Kernel Hit-and-Run (KHR) 
algorithm (blue solid lines). Note that the flux ranges span different orders of magnitude, 
but still, the profiles are very smooth for both weighted population dynamics and the Kernel 
Hit-and-Run algorithm. 
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Concerning the results obtained by KHR, we have been particularly careful to make sure we obtain 
a uniform distribution of the solution space S. As the uniform measure is guaranteed to be a limiting 
distribution, the difficulty lies, of course, in deciding when such a limit has practically been reached in the 
simulations. In this regard, we decided to apply three conservative measures to ensure this: first of all, we 
checked that the initial conditions did not affect the results and, after that, the simulations were averaged 
over the initial conditions (the initial points were generated using the MinOver algorithm [20]); secondly, 
the initial time-window was disregarded in the sampling; and finally, in order to avoid correlation 
effects in the sampling of the PDFs, we only recorded points at large spacing in time. In Figure 6, 
we report a panel with all the PDFs for the hRBC network to make the comparison between the methods 
more straightforward. 



4. Conclusions 



In this work, inspired by techniques employed in the statistical mechanics of disordered systems, we 
have presented a novel method to estimate distributions of reaction fluxes in constraint-based models of 
metabolic networks. The wBP methodology has, in our view, clear advantages when compared with 
alternative approaches. If compared to rejection-based Monte Carlo methods [12] or even to more 
refined sampling approaches [13], our algorithm has the significant advantage of scaling linearly with the 
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system size, a feature that makes it particularly suitable to study genome-scale metabolic reconstructions. 
Comparing wBP with similar MP-based approaches [9], our method turns out to be unaffected by the 
flux ranges spanning several orders of magnitude nor by the stoichiometric coefficients taking real 
values. Note that the former property is particularly important for the study of real metabolic systems in 
physiologic conditions, as enzymatic rates can vary wildly across the network [12,17]. 

wBP can also be integrated with optimization-based flux balance analysis (FBA), as it easily allows 
us to evaluate the PDFs of the enzymatic rates close to optimality (assuming a score function is known) 
by just injecting a priori to keep fluxes "close" to their optimal value. 

We have also compared the performance of wBP against the KHR method. The latter is a controlled 
Hit-and-Run Monte Carlo taking place in the null space defined by the set of internal reactions, where 
a considerable effective dimensional reduction can be achieved. Indeed, starting from the original 
iV-dimensional space of solutions, one can wind up into a space that, in the best of cases, has the 
same number of dimensions as the number of exchange reactions. Given the considerable gains that this 
observation provides, we believe that this method may be worth being explored further in its own right. 

The validation of the wBP algorithm in the hRBC network, which can be considered a benchmark 
for the sampling problem for constrained metabolic models, opens the door to future applications of the 
method to more relevant organisms, such as Mycoplasma pneumoniae [21] ox Escherichia coli [2]. In our 
opinion, the fact that its algorithmic complexity scales linearly with the number of reactions, combined 
with the aforementioned ability to deal with more realistic bounds, makes it a highly promising candidate 
for effective solutions to this challenging task. 
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